COMP124 � Week 2 Assembly

Machine Code Programming

CPU only sees binary in memory
Instead of writing machine code in binary, assembly is used for ease
- Low level
- Opcodes represented by mnemonics
- Registers have names
- Memory addresses specified with labels
An assembler is used to turn assembly into machine code
- Not the same as compilation of high level code
- Each assembly line directly translates to one machine code instruction
A label can refer to the address of an instruction, or the address of a data item
The mnemonic and its operands are directly translated into machine code

adjust: mov eax, num1 ; put number into register

Here, adjust is a label (pointing to an instruction address)
mov eax, num1 is the mnemonic for the instruction followed by two operands (eax is a register)
semicolons are used for comments
instruction labels and comments are optional, so this line can be written basically as
mov eax, num1

To run this code in C++:

#include <stdio.h>
#include <stdlib.h>

int main (void) {
	int num = 10; // you can declare variables outside the assembly block

	// assembly block:
	_asm {
		mov eax, num
		add eax, 12
		mov num, eax
	}
	return 0;
}

Intel x86 Registers

Lots of registers, but only some needed for these purposes
IP and IR are registers that have been mentioned
Four main general purpose registers:
- EAX - accumulator
- EBX - base register
- ECX - counter register
- EDX - data register
These have designated meanings, however can be used for whatever purpose
- EAX usually used for calculations
- ECX usually used for keeping track of loop iterations

The Accumulator

RAX - 64 bits of the accumulator
EAX - lowest 32 bits of the accumulator
AX - lowest 16 bits of the accumulator
AH - Upper 8 bits of the AX
AL - Lower 8 bits of the AX

Example code:

Put 42 into accumulator:
- mov eax, 42
Move lowest 16 bits of a variable into accumulator (count is variable label)
- mov ax, count
Move ascii value of 'x' into lowest byte of accumulator
- mov al, 'x'
Increment accumulator
- inc eax
Add 10 to accumulator
- add eax, 10

Note: The first operand is the destination, the second operand is the source

For move operations, a register must be involved. You cannot move data directly from memory to memory. Also, for move operations the source operand is not changed or erased.

Basic Maths

For a basic high level instruction like

int num = count1 + count2 - 10;

In assembly, the accumulator stores the result of each step (accumulating the answer)

mov eax, count1
add eax, count2
sub eax, 10
mov num, eax

Addition and subtraction work as expected, for multiplication, only one operand is used which is the value to multiply the accumulator by, for example, to calculate 10*12

mov eax, 10
mov ebx, 12
mul ebx

^ This would result in 120 being stored in EAX

Division

Some things need to be set up first
- Dividend formed from EDX (high 32bits) and EAX (low 32bits)
- Divisor stored in another register
This performs integer division, so could be remainder
Result stored in EAX and remainder stored in EDX
For 120/9

mov ebx, 9
mov edx, 0
mov eax, 120
div ebx

Operation will set status flags if the result is too big or division by zero is attempted

Status flags

Important flags:

CF - carry flag - previous operation had a carry from the most significant bit
ZF - zero flag - previous operation had a zero result
SF - sign flag - previous operation was positive (0) or negative (1)
OF - overflow flag - previous operation result was too big to fit in memory
We can use jump instructions to check flags and take appropriate action

Unconditional Jump

An unconditional jump will move the IP to the given address label

       mov eax, 10
begin: add eax, 10
       jmp begin

The above code is an infinite loop that keeps adding 10 to EAX
Eventually, EAX would get too big and overflow
Jumping is unrestricted, so should take care to avoid messy code with jumps all over the place

Conditional Jumps

A conditional jump happens if a certain condition is true
If the condition is false, the IP moves to the next instruction
Jump instructions:

jc - jump if carry flag
jnc - jump if no carry flag
jz - jump is zero flag
jnz - jump if no zero flag
js - jump if sign flag
jns - jump if no sign flag
jo - jump if overflow flag
jno - jump if no overflow flag

eg.

num = num - 10;
if (num==0) {
	num = 100;
}

this code in assembly would look like:

		mov eax, num
		sub eax, 10
		jnz store
		mov eax, 100
store:  mov num, eax

Comparing Values

The cmp instruction compares two values
Internally, it subtracts one from the other without changing either operand
If both values are the same, the zero flag is set
cmp eax, ebx
By placing this before a jump instruction:
je - jump if operands are equal
jne - jump if operands are not equal
jg/jnle - jump if the first operand is greater
jle/jng - jump if the first operand is less than or equal
jl/jnge - jump if first operand is less than
jge/jnl - jump if first operand is greater than or equal
These only work as expected if they immediately follow a compare instruction

If-Else in Assembly

if (num>0){
	pos = pos+num;
} else {
	neg = neg+num;
}

would be

		mov eax, num
		cmp eax, 0
		jg postv
negtv:  add neg, eax
		jmp endif
postv:  add pos, eax
endif:  ..
		..

Loops

Can loop over instructions by jumping backwards
ECX can be used in conjugation with the loop instruction
- Load the amount of iterations into ECX (eg.10)
- At the end of the code to iterate over, add loop [label] where label is the the label of the first line of code of the loop, and it will first decrement ECX by one and then it will jump if the ECX is not zero

Labels & Memory Addresses

A label just points to a memory address
In the C++ code, we declare a variable name and optionally give it a value
- int age = 21;
In the assembly code, we can use the label to refer to the variable in memory
- mov eax, age
But if you want the memory address of the variable, not its value, you use lea (Load Effective Address)
- lea ebx, age
And if we have a memory address stored in a register, we can use register indirect mode to get the value stored in that location
- mov eax, [ebx]

Arrays

Arrays are just items stored in consecutive memory locations
The amount of memory depends on the data being stored in the array
In a 32-bit system (eg. Intel x86), each integer takes up 4 bytes of memory
- int grades[4] = {64, 78, 60, 55};
We first get the memory address of the array:
- lea ebx, grades
To get the value stored in the second array item, we add 4 to the address
- add ebx, 4
- mov eax, [ebx]

Array Processing

With this, we can loop through an array and sum its contents
In the C++ part of the code, we define an array with 4 items:
- int grades[4] = {64, 78, 60, 55};
In the assembly code we set up the loop
- lea ebx, grades load memory location of array into ebx
- mov ecx, 4 set loop counter to 4
- mov eax, 0 set eax to 0
- floop: add eax, [ebx] add value in current memory location in array to eax
- add ebx, 4 go to next memory location in array
- loop floop loop with ecx
This uses 3 registers
- EAX - stores the sum as we go along
- EBX - stores the memory address of the current item in the array
- ECX - loop counter

Subroutines

Parameter passing is tricky in assembly
Subroutine calls just change the IP (but getting back where you were before is tricky)
No local registers in subroutines
No fancy way to specify parameters and their types
Use the call instruction with the label of the first line of the subroutine
Use the ret instruction to return from the subroutine
If you don't return, the IP will just go to the next instruction in memory
When a subroutine is called, the IP is changed to it's address
- Fetch-execute cycle continues with instructions from that point onward
- The ret instruction changes the IP back to the address following the original call instruction
- So the CPU must remember where to return to